oo 

o 

en 



% 



The causal meaning of Fisher's average effect 
James J. Lee^* 



m 

Carson C. Chow 
O" 
(N 

^ ■ ^Laboratory of Biological Modeling 

-^ ■ National Institute of Diabetes and Digestive and Kidney Diseases 

^ \ National Institutes of Health 

Bethesda, MD 20892, USA 

m 

O ' *To whom correspondence should be addressed; E-mail: leejj5@mail.nih.gov. 

• 1— I ■ 



RESEARCH PAPER 

RUNNING HEAD: Causal meaning of average effect 



Summary 

In order to formulate the Fundamental Theorem of Natural Selection, Fisher defined 
the average excess and average effect of a gene substitution. Finding these notions to 
be somewhat opaque, some authors have recommended reformulating Fisher's ideas in 
terms of covariance and regression, which are classical concepts of statistics. We argue 
that Fisher intended his two averages to express a distinction between correlation and 
causation. On this view the average effect is a specific weighted average of the actual 
phenotypic changes that result from physically changing the allelic states of homologous 
genes. We show that the statistical and causal conceptions of the average effect, perceived 
as inconsistent by Falconer, can be reconciled if certain relationships between the genotype 
frequencies and non-additive residuals are conserved. There are certain theory-internal 
considerations favoring Fisher's original formulation in terms of causality; for example, 
the frequency-weighted mean of the average effects equaling zero at each locus becomes 
a derivable consequence rather than an arbitrary constraint. More broadly, Fisher's dis- 
tinction between correlation and causation is of critical importance to gene-trait mapping 
studies and the foundations of evolutionary biology. 

Keywords: quantitative genetics, causality, confounding, selection bias, natural selec- 
tion 



1. Introduction 



Darwin perceived that hereditary variation in fitness leads to an increase in adaptive 
complexity. In an attempt to provide a Mendelian and mathematical formulation of 
this profound insight, Fisher expounded the Fundamental Theorem of Natural Selection 
(FTNS), which in a modern paraphrase states that the partial increase in population mean 
fitness ascribable solely to changes in allele frequencies by na tural selecti on is e qual to 
the additive genet ic variance in fitne ss (Be nnett, Il956 



1989 



2004, 



2011 



; Frank & Slatkin, 



1992 



; Kimura, 



1994 . 



2002 



958 



; Price, 



; Lessard, 



19721 : Ewens, 



19971 : Okasha, 



; Edwards, 

20081 ) . In the discrete-time formulation of the FTNS, the additive genetic variance is 
proportional to this partial increase, as it must be divided by the mean fitness. 

In his exposition of the FTNS, Fisher took some pains to define the concepts of average 
excess and average effect. In his own words. 

Let us now consider the manner in which any quantitative individual mea- 
surement, such as human stature, may depend upon the individual genetic 
constitution. We may imagine, in respect of any pair of alternative [alleles], 
the population divided into two portions, each comprising one homozygous 
type together with half of the heterozygotes, which must be divided equally 
between the two portions. The difference in average stature between these two 
groups may then be termed the average ex cess ( in stature) associated with the 



gene substitution in question. . . . (Fisher, 



1930l . p. 30, emphasis added) 



In contrast, 



[b]y whatever rules . . . the frequency of different gene combinations, may be 
governed, the substitution of a small proportion of the genes of one [allelic] 
kind by the genes of another will produce a definite proportional effect upon 



the average stature. The amount of the difference produced, on the average, 
in the total stature of the population, for each such gene substitution, may 
be termed the average effect of such substit ution, in contra-distinction to the 
average excess as defined above. . . . (Fisher, 



1930 



p. 31, emphasis added) 



It is natural to conceive [of the average effect] as the actual increase in the total 
of the measurements of a population, when without change in the environment, 
or the mating system, the gene subst itutio n is experimentally brought about, 
as it might be by mutation. (Fisher, 



1941 



p. 373, emphasis added) 



This paper addresses a puzzle raised by Falconer (1l985l ) in his brilliant explication of 
Fisher's two genetic averages. Falconer assumed that what Fisher meant by the quoted 
definition of the average effect was as follows. We randomly sample a zygote immedi- 
ately after fertilization but before the onset of any developmental events. If the zygote's 
genotype contains a gene of a certain allelic type, say Ai, we change it to A2- This experi- 
mental intervention may lead to a value of the focal phenotype at the time of measurement 
that differs from what it would have been if the intervention had not been performed. 
Falconer reasoned that the expected magnitude of this difference corresponds to Fisher's 
verbal definition of the average effect. 

Falconer then showed that Fisher's (1941) now widely accepted mathematical defini- 
tion of the average effect — the partial regression coefficient of gene count in the linear 
regression of the phenotype on all loci in the genome — does not generally coincide with 
the definition in terms of experimental gene substitutions performed at random. Falconer 
expressed surprise at the apparent invalidity of the latter definition, given that "Fisher 
uses the imaginary replacement of one allele by another as a verbal description to intro- 
duce the idea of average effect, and it seems to have been seen by him as the basis for the 
concept" (p. 334). 
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Falconer correctly perceived the importance of experimental intervention to Fisher's 
conception of the average effect. Indeed, Fisher did not even bother to spell out his 
regression definition in the first edition of The Genetical Theory of Natural Selection. 
Furthermore, to any reader familiar with Fisher's work on experimental design and his 
controversial stance on the tobacco-cancer connection, the quotations given above must 
bring into mind his repeated admonition that an observed excess in the average measure- 
ment of one group over another can always be interpreted as the causal effect of the factor 
distinguishing the groups under the following circumstance: th e allotment o f members to 



IQSa 



1958al ). This preoc- 



groups has been randomized in a controlled experiment (Fisher, 
cupation with causation is one of the stark contrasts between Fisher and his nemesis Karl 
Pearson; contrary to the intellectual fashion of the Edwardian era, Fisher did not regard 
causality as a meaningless concept. In the inaugural issue of the journal Philosophy of 



Science, the word cause and its derivatives appear in Fisher (11934r) no fewer than seventy 



19781), Fisher advocated 



times. Over much resistance by seasoned experimenters (Box, 
randomization in experimental design for the precise purpose of distinguishing causation 
from spurious correlations brought about by confounding variables. There is thus com- 
pelling reason to believe that the notion of experimental control revealing causation is 
critical to the proper interpretation of the average effect. 

We argue that a more nuanced reading of Fisher's writings can bring his experimental 
and regression definitions of the average effect into full agreement in certain special cases. 
We then provide reasons to favor the experimental definition in more general situations. 
A striking disadvantage of the regression definition is that its use invalidates the FTNS 
if some of the variance in fitness has environmental causes. 



For simplicity our main text mostly follows Falconer ( 1l985h in treating the case of a 



single locus with two alleles. We provide the generalization to multiple alleles and loci in 



two of the later sections. Some interesting new concepts do arise in this generahzation, 
but the central ideas can be conveyed without multilocus notation, which seems inevitably 
to be either cumbersome or opaque. 

2. A Notation for Causal Notions 

A formal sjTiibolic language to distinguish causal relations from me rely co rrelational ones, 



such as the counterfactual notation of Neyman (119231 ) and Rubin (120051 ). was not to our 
knowledge ever adopted by Fisher. This is despite the fact that he frequently wrote about 
this distinction. Although such formalisms lack the elegance of Fisher's prose, adopting 
the appropriate formalism is an aid to understanding. 



For this purpose we adopt the do operator of Pearl (119951 ). We are to interpret an 
expression such as E,[Y \ do{x)] to mean the expectation of Y given that the random 
variable X has been experimentally fixed to the value x. The contrast between conditional 
quantities containing the do symbol and traditional conditional quantities is evident in 
the expressions 

F{mud I rain) > F{mud) and P(ram | mud) > F{rain) (1) 

and 

F[mud I do{rain)] > F{mud) and F[rain \ do{mud)] = P(rain). (2) 

([T]) indicates that we are more likely to find mud if we have already observed rain. Be- 
cause co-occurrence is symmetric, it also becomes more likely that it has rained if we have 
already observed mud. On the other hand, ([2]) symbolizes the much stronger and asym- 
metrical assertion that rain causes mud and not vice versa; muddying up the backyard 
with a garden hose will not make it rain. 



This notation and its associated machinery may be of some benefit in the burgeoning 
field of genome- wide association studies (GWAS), where it is important to single out 
genetic variants with a causal effect on a given phenotype from markers that are merely 
associated with the phenotype for other reas ons, i ncluding linkage disequilibrium (LD) 



with a nearby causal variant (Visscher et al. 



2012I ). Letting Y denote the phenotype of 



interest, we can say that a genetic variant is a causal variant if the equality 

E[r I do{AiAi)] = E[Y I do{AiA2)] = E[F | ^0(^2^)] (3) 

does not hold. The expectation is taken over the space of all possible multilocus genotypes 
and environments. Note that the equality does in fact hold for a non-causal marker locus 
in LD with a causal locus. If we could experimentally mutate a randomly chosen zygote's 
genotype at a biologically inert marker locus immediately before the onset of development, 
we would not expect any ensuing change in the phenotype. 

Th e do notation is more than a convenient means of fixing ideas. The treatise of Pearl 
(I2OO9I ) grounds this symbol in a rich syntax and semantics. From one point of view, the 
work of Pearl can be regarded as a vast generalization of Wright's (1968) path analysis. 

For simplicity we will speak of events in the life cycle such as fertilization, development, 
and phenotypic measurement as if all individuals experienced each such event at the same 
time — a convention that is appropriate for an organism with a life cycle consisting of 
discrete and non-overlapping generations. We can then speak of selecting one zygote for 
an experimental treatment from all those zygotes making up the current generation. Our 
discussion also applies, however, to organisms with a life cycle consisting of continuous 
and overlapping generations. In this case a quantity such as K[Y \ do{AiAi)] is to be 
interpreted as the present phenotypic value that a randomly selected organism would have 
been expected to obtain if its genotype could have been converted to AiAi immediately 



after its own fertilization. Fisher's own writings suggest the importance of counterfactual 
thinking. In a summary of his work on the correlations between relatives, he wrote: "[I]t 
should be clearly understood what we mean by a cause of variability. If we say, 'This boy 
has grown tall because he has been well fed,' we are not merely tracing out cause and effect 
in an individual instance; we are suggesting that he might quite probably have been worse 



1919 



p. 214, emphasis in 



fed, and that in this case he would have been shorter" (Fisher, 

original). The do operator bears both interventional and counterfactual interpretations. 

If necessary, each organism can be weighted by reproductive value. 

3. Falconer's Interpretation of the Experimental Av- 
erage Effect 

We can use the do operator to symbolize the gene substitutions in Fisher's thought ex- 
periment. Here we use it to review Falconer's understanding of this experiment for a 
single biallelic locus. We first note that if genotypic and environmental causes of pheno- 
typic variation act additively and independently, then quantities such as E(F | ^i^i) are 
precisely equal to E[F | do{AiAi)] at the single causal locus. Until we say otherwise, we 



assume the stochastic independence o 



geno types and environments. 



Following the notation of Fisher ( 119181 ). we let P, 2(5, and R denote the respective 
frequencies of the genotypes AiAi, A1A2, and A2A2- Given that a zygote's genotype is 
AiAi, we write the expected phenotypic effect of changing a gene's allelic type from Ai 
to A2 as 

AY I AiAi -^ A1A2 = E[Y I do{AiA2),AiAi] - E(F | AiAi). (4) 

There is no contradiction in conditioning on both the observation of ^1^1 and the exper- 
imental setting of the genotype to AiA2- This simply means that instead of performing 
the experiment on a zygote sampled at random from the entire population, we perform it 
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specifically on a zygote that would otherwise have borne the genotype AiAi. Similarly, 
we define 

AY I A1A2 -^ A2A2 = E[r I do{A2A2),AiA2] - E{Y \ A1A2). (5) 

The problem with identifying the effect of a gene substitution — as in identifying the 
effect of an alteration to any nonlinear causal system — is that the expected change depends 
on the context. In other words (j4]) and ([5]) are not equal in general. Falconer supposed 
that Fisher arrived at the "average effect" of substituting A2 for Ai by averaging (jll) and 
([5]) in the following way. We sample a zygote at random and then select one of its genes 
at random. If the chosen gene is of allelic type A2, we leave it alone. If the chosen gene is 
of type Ai, we change it to A2- The expected phenotypic effect of the gene substitutions 
performed under this scheme is thus 

P(Ar I AiAi -^ A1A2) + QiAY I A1A2 -^ A2A2) ,^. 

FTq • ^'^ 

Falconer pointed out that (El) does not agree with the regression definition of the 



average effect that Fisher (1194 ll ) gave in an article criticizing Sewall Wright for conflating 
the average excess and average effect. This article required explicit expressions for the two 
genetic averages in traditional notation, and Fisher obtained an expression for the average 
effect adequate for demonstrating its distinctness from the average excess by minimizing 
the sum of squares 

P[E(r I AiAi) -v + af + 2g[E(r I A1A2) - vf + i?[E(r I A2A2) -V- a]2, (7) 

where v is the regression constant. Using a notation that generalizes to a locus with more 
than two alleles, we can express this sum of squares equivalently as 

P[E(r I AxAx) -II- 2aif + 2g[E(F | A1A2) - n - a^ - a^^ 

+ P[E(F I ^2^) - /U - 2^2]^ where /i = E(r). (8) 



In the definition ([7]), then, the average effect a is the slope in the regression of the 
phenotype on gene count. a\ and a^ in (|8]) are the average effects of the two alleles 
individually — a notion to which we will return. For now we simply note that a will turn 
out to equal 02 — 0:1 in magnitude. There is some ambiguity in the literature over whether 



the outcome variable in the regression should be def ined, as in (El) , with t he su 



the unconditional phenotypic mean (Fisher, 



1958b 



Price, 



1972 



; Ewens, 



Dtraction of 



20111 1. However, 



this choice simply adds a constant term to the average effects of the individual alleles, and 
this term disappears in the biallelic average effect o = 02 — «i- In our later discussion of 
individual average effects, we will give a compelling reason to favor the mean subtraction. 

Perhaps frustrated by Fisher's concise style. Falconer concluded his article by ap- 
provingly quoting Price's (1972) remark that Fisher's ideas can be translated into well- 
understood concepts such as covariance and regression without dealing with his "special" 
notions of average excess and average effect. 

In the following we show that the two definitions of the average effect can be reconciled, 
in the case of genotype-environment independence, for a specific weighting of the two 
possible substitutions. However, if such independence fails to hold, it is not possible to 
dispense with Fisher's "special" definition in terms of experimental gene substitutions. 

4. Fisher's Experimental Average Effect 

Fisher conditioned the gene substitutions in his hypothetical experiment on the "rules" 
by which "the frequency of different gene combinations may be governed." It is this dif- 
ficult subtlety that Falconer did not take into account. In The Genetical Theory Fisher's 
wording seems to imply that it is only the mating scheme that determines how differ- 
ent alleles combine to form whole-genome genotypes. Later he acknowledged that other 
factors also influence the departure of genotype frequencies from random combination of 
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genes, explicitly mentioning "the partial isolation of sections of the population" (Fisher, 



1941L p. 54). The implication for the experimental gene substitutions is that they must be 
carried out in a manner that does not disturb the arrangement of alleles into genotypes 
called for by the population's rules of formation. 

The three genotype frequencies sum to unity, as do the frequencies of the two alleles. 
Thus, given the frequency of one allele, one more parameter is required to specify the 
genotype frequencies. There appears to be complete freedom in the choice of this param- 
eter. For example, one possibility is Wright's inbreeding coefficient F (Crow & Kimura, 



19561). As we later show, if we require the experimental average effect to coincide with 
the regression average effect in the case of genotype-environment independence, then we 
must choose the parameter to be A = Q^/{PR), the ratio of the squared (ordered) het- 
erozygote frequency to the product of the homozygote frequencies. A can be written in 

the symmetrical form 

F{A2\Ai) F{Ai\A2) 



111 A) P(^2|^2)' 

and it attains th e con stant value of unity if the population mates randomly, a fact first 



noted by Hardy fll908f ). 

Let p = Q + R denote the frequency of A2, and write the population mean of Y as 
a function of allele frequency and the rules of combination, fi{p, A). We now show that 
the expression /i(p + dp, A) — /i(p. A) is proportional to the average effect, a, obtained 
from regression equation ((Tj). In other words the ratio A must be kept constant under this 
manipulation, whatever the population's rules of formation have determined this ratio to 
be, in order for the experimental gene substitutions to yield what Fisher intended by the 
average effect. 

The population mean is given by the expression 

/x = P E[F I do{AiAi)] + 2QE[Y\ t/o( A A)] + RE[Y \ c/o( ^ A)] . (9) 
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The average effect is tlien proportional to tlie cliange of ^ with respect to p while holding 
A constant. We can increase p by carrying out either the intervention AiAi -^ A1A2 
or A1A2 -^ .4,2^2 • As detailed in the Appendix, upon noting that the differential of 
Q^ = XPR for constant A yields the differential equation 

dP dR _2dQ 

we find that Fisher's average effect is 

ci(AF I AiAi -^ A1A2) + C2(AF | A1A2 -^ A2A2) ,, , . 

a = , (11) 

C1 + C2 

where the weights are 

Ci = P{Q + R), 
C2 = R{P + Q). 

Let us recapitulate the meaning of fITT]) . Immediately after fertilization we take a 
random sample of the zygotes bearing the genotype ^1^1. We then randomly assign 
some of these zygotes to the "treatment," which consists of changing the allelic type of 
a gene from Ai to A2- The expected difference in phenotype between treatments and 
controls at the time of measurement is the causal effect of the gene substitution. We 
perform the analogous experiment to determine the causal effect of changing A1A2 to 
A2A2. The weighted average of the two causal effects — where the weights Ci and C2 
are chosen so as to preserve A if the two types of gene substitutions are applied to the 
population in the ratio Ci/c2 — is the average effect of gene substitution holding constant 
the rules governing the frequencies of the different genotypes. 

Now that the average effect has been defined in (fTTI) . we can apply it to an example of 
a population changing in mean phenotypic value under a sequence of gene substitutions 

12 



Table 1: Sequence of experimental gene substitutions yielding the average effect. 



experimental change 


genotype numbers 




A(^iV) 


A 


numb 


3r of changes 


— 


40, 40, 20 






1/2 


AiAi -^ A1A2 


39, 41, 20 


3 




.5387821 


A1A2 -^ A2A2 


39, 40, 21 


1/2 




.4884005 


AiAi -^ A1A2 


38, 41, 21 


4/3 




.5266291 


A1A2 -^ A2A2 


38, 40, 22 


1/2 




.4784689 


AiAi -^ A1A2 


37, 41, 22 


1 




.5162776 


A1A2 -^ A2A2 


37, 40, 23 


1/2 




.4700353 


AiAi -^ A1A2 


36, 41, 23 


6/7 




.5075483 



(Tabled]). This exampl e may be seen as a numerical counterpart to the diagrammatic 
illustration by Edwards ( 120021 ). Suppose that the effect of changing an ^1^1 individual to 
A1A2 is 3 phenotypic units, whereas the effect of changing A1A2 to A2A2 is —2. Suppose 
also that the numbers of the genotypes AiAi, A1A2, and A2A2 in this population are 
40, 40, and 20 respectively. These genotype frequencies imply that (ci, C2) is proportional 
to (4,3). Table [H shows how the average phenotypic change and A are affected by each 
step in a sequence of gene substitutions leading to an increase in p but tending to keep 
A constant. The first column gives the gene substitution. In this sequence the two types 
of substitution alternate, but this is not an essential feature. The second column gives 
the numbers of the genotypes after the gene substitution. The third column gives the 
cumulative change in the total phenotypic measurements (the mean phenotype times the 
population size) divided by the number of gene substitutions. The fourth column gives 
the new value of A after the gene substitution. 

It is readily confirmed that the final value of A is the closest to the starting value of 
1/2 that can be achieved with 7 gene substitutions. If we take population size to infinity, 
we can make the discrepancy between the original and new values of A as small as we 
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please. 

In the special case of genotype-environment independence considered so far, where 
equalities such as E,{Y \ ^i^Ai) = E,[Y \ do{AiAi)] always hold, Fisher's experimental and 
regression definitions of the average effect coincide for constant A. In the example above, 
after assigning each genotype an expected phenotypic value consistent with the magni- 
tudes of the experimental effects, it is easily verified that the slope in the least-squares 
regression of phenotypic value on A2 gene count is 6/7. 

5. Gene-Environment Correlation and Interaction 

As a preliminary matter, we note that any variable along a causal path (in the sense of 
Wright and Pearl) from genotype to phenotype must not be counted as environmental. For 
example, if dairy consumption affects stature, it is tempting to regard dairy consumption 
as an environmental (non-genotypic) variable with respect to stature. But if genetic 
variation affects lactose tolerance and thus the amount of milk consumed, assigning the 
effect of dairy consumption on stature to the environment ignores the fact that the path 
genotype — ?■ lactose tolerance — )■ dairy consumption — )■ stature ultimately begins with a 
genetic variable. This subtlety may have been among the reasons why Fisher favorec. 



1983 



"speaking of the residue as non-genetic, rather than environmental ..." (Bennett 
p. 260) 

It is worth asking whether Fisher intended the average effect to be defined in the event 
that genotypic and environmental causes are either dependent or non-additi ve. In many 



places he ce rtainly assumed or argued for independence and additivity (Fisher, 



1918 



1941 



1953 



I970I ). and it has been asserted th at Fis her's biometrical theory is meaningless if 



these conditi ons ar e not met {e.g., Vetta, 



1980|). 



As Price (119721 ) has pointed out, Fisher's exposition in The Genetical Theory leaves 
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much to be desired. A close reading of this text and Fisher's other writings, however, turns 
up many reasons to suspect that Fisher regarded independence and additivity as reason- 
able specifications for certain demonstrations and not as strictly necessary conditions for 
the average effect to be defined. 

1. In the discussion of the average effect in The Genetical Theory, Fisher did not 
explicitly refer to his other work where he made special assumptions regarding the 
environment. 

2. The average effect is a key concept in the FTNS, which Fisher regarded as an exact 
and rigorous statement. One would like to believe that Fisher, having been trained 
in mathematical physics, would not have compared the FTNS to the second law of 
thermodynamics if the FTNS depended on assumptions regarding the environment 
that must always be approximations at best. 

3. We can read that "[t]he genetic variance as here defined is only a portion of the vari- 
ance determined genotypically, and this will differ fro m, an d usually be somewhat 



1930l p. 34). The genotypic 



less than, the total variance to be observed" (Fisher, 

variance is greater than the total variance only if "good" genotypes tend to be found 

in "bad" environments, and thus Fisher was clearly allowing for the possibility of 

dependence. 

4. In a letter to J. A. Fraser Roberts, Fisher wrote that 

[t]here is one point in which Hogben and his associates are riding for a fall, 
and that is in making a great song about the possible, but unproved, im- 
portance of non-linear interactions between hereditary and environmental 
factors. . . . What they do not see is that we ordinarily count as genetic 
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only such part of the genetic effect as may be included in a linear formula 
and that we make a present to the environmentalists of such variation 
due to the combined action of genetic an d env ironmental factors as is not 
expressible in such a formula. (Bennett, 



1983 



p. 260) 



These remarks clearly show that Fisher did not regard genotype-environment inter- 
action as an obstacle to defining the average effect. 

Emboldened by this evidence regarding the intended generality of the average effect, we 
extend our treatment to encompass gene-environment correlation and interaction. 

We first suppose that genotypic and environmental causes act additively but are not 
independent. Additivity means that the experimental effect of a gene substitution remains 
the same regardless of the environment in which the experiment is carried out; varying the 
environment simply raises or lowers the expected phenotypic values of all three genotypes 
by the same amount. For instance, 

AF I AiAi -^ A1A2, £^ = AY\ AiAi -^ A1A2, £j (12) 

for any choice of environments £i and £j. In this case all of the discussion in previous 
sections continues to apply except for the equivalence of the experimental and regression 
average effects. If some genotypes are more frequently found in favorable environments 
for phenotypic development, then the regression of phenotypic value on gene count does 
not have a simple genetic interpretation. 

Non-additivity means that at least one equality of the kind in (TT2|) does not hold. 
The precise magnitude of the expected change upon an experimental gene substitution 
now depends on some aspect of the environment that the manipulated zygote will ex- 
perience between the onset of development and the time of measurement. This case is 
problematic because now a quantity such as AF | AiAi — )■ A1A2 is not necessarily equal 
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to AY I A1A2 -^ AiAi, since the genotypes AiAi and A1A2 niay tend to be found in 
different environments. This difficulty can be overcome by redefining expressions such 
as Ay I ^1^1 — 7- A1A2 so that each symbohzes a difference between experimental treat- 
ments rather than a difference between a treatment and an unperturbed control group. 
For example, (jl]) would become 

AY I AiAi -^ A1A2 = E[Y I do{AiA2)] - E[F | do{AiAi)]. 

Seeking an equivalent generalization that retains the interventional form of (j4]) and ([5]), 
however, sheds substantially greater light on the problem. 

Before taking up the issue of gene-environment interaction, it is helpful to review 
Fisher's motivation for holding A constant as a means to address gene-gene interaction. 
In order to formulate the FTNS, Fisher wished to quantify the causal effect of changing 
allele frequency while holding the environment constant. In his view the way in which 
alleles combine to form genotypes, as parameterized by A, should be regarded as part 
of the environment. Although this choice may initially seem eccentric, because fitness 
differences among genotypes will typically change both p and A, it becomes reasonable 
when we realize that A may also change as a result of extrinsic events such as the formation 
or dissolution of geographical hindrances to random mating. 

There is an analogy here to Fisher's analysis of covariance to separate the direct and 
indirect effects of a given experimental manipulation on a focal outcome. For instance, 
in an experiment to determine whether a given fertilizer affects the purity of sugar ex- 
tracted from sugar-beets, the experimenter may already know that the fertilizer affects the 



1970i pp. 283-284). 



weight of the beet roots, which in turn affects sugar purity (Fisher, 

The experimenter may wish to know whether the fertilizer affects sugar purity through 

a direct causal path, fertilizer — )■ sugar purity, distinct from the indirect path fertilizer 
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—7- root weight — )■ sugar purity. In certain cases adjustment for root weight by analysis 
of covariance yields the target quantity: the amount by which sugar purity would change 
upon application of the fertilizer, if root weight could be experimentally clamped to the 
value that it would have obtained in the control condition. Similarly, while gene substi- 
tutions that are not deliberately balanced as in ( ITTl) will typically change both p and A, 
we can still mathematically define an average effect stipulating that A remain s clam ped 



to a constant value. This point of view is similar to one expressed by Okasha (120081 ). 

Once we regard any change in how alleles are arranged into genotypes as environmen- 
tally caused, it perhaps becomes obvious that we should regard certain changes in the 
allotment of genotypes to environments as such. After all, a redistribution among envi- 
ronments might lead to changes in the phenotypic means of the genotypes. Such changes 
in the genotype-phenotype mapping, when caused by extrinsic events such as climate 
change, are readily classified as environmental in nature. This consideration suggests that 
the gene substitutions defining the average effect in the presence of genotype-environment 
interaction should be balanced in such a way that the phenotypic means of the genotypes 
remain constant. 

Since equalities such as E(F | ^i^i) = K[Y \ do{AiAi)] do not hold when genotypes 
and environments are also dependent, there is ambiguity in what is meant by holding 
constant the phenotypic means. We first consider holding constant the observed means. 
If the environments interacting with genotypes can be classified discretely, then we can 
write an equation like 

E{Y\AiAi) = ^P(^i|AA)E(r|AA,^i) (13) 

i 

for each genotype. Because genotypes and environments exhaust all possible causes of 
phenotypic variation, K{Y \ AiAi,Si) is equivalent to K[Y \ do{AiAi) , do{£i)]. In a sense 



even the expectation operator is unnecessary because F is a deterministic function when 
both genotype and environment are specified. 

Constancy of observed means requires constancy of the conditional probabihties taking 
the form F{Si \ ^i^i). A candidate definition for the average effect is then 

2adp = /i[p + dp, A, P(^i I AiAi), ..., P(^„ | A2A2)] 

- fi[p, A, P(^i I AiAi), ...,F{Sn\ A2A2)]. 

The problem with this candidate definition, however, is that it can lead to a nonzero 
average effect even if in each environment neither gene substitution has a causal effect. 
This is because preserving a genotype's conditional probabilities of being found in the 
various environments may require that some gene substitutions be accompanied by the 
placement of the manipulated organism in a different environment; the resulting change 
in phenotype may then be entirely the result of the environmental change. 

If we instead consider holding constant the experimental means, then we obtain 

E[Y I do{AiAi)] = 5^P[^i I do{AiAi)] E[Y \ do{AiAi),Si] 

i 

= Y,n^iMy\ ^1^1, ^^)■ (14) 

i 

The left-hand side is the expected phenotypic value upon sampling a zygote at random 
and, if its genotype is not AiAi, making it so. Since changing the genotype of a zygote 
cannot affect its environment, we have F[Si \ do{AiAi)] = F{Si) for each i and thus a 
justification of the second line. Therefore preserving the experimental means only requires 
a constant marginal distribution of environmental states. Of course, we can always abide 
by this constraint if we never foster any manipulated organism in a different environment. 
This ensures that a nonzero average effect is indeed an average of genetic effects, at least 
one of which would turn out to be nonzero under experimental control. 
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Hence a natural definition of tlie average effect in the presence of genotype-environment 
interaction is 

_ E» ci,»(Ay I AiAi -^ A1A2, Si) + C2,»(Ay I A1A2 -^ A2A2, Si) 

OL — ^^^ ; , UOj 

l^i Cl,i + C2,i 

wliere 

ci,i = ciP(6:,), 

C2,i = C2ns,). 

6. Average Effects of Individual Alleles 

We will now explain how the experimental average effect of an individual allele may 
be defined for a locus with any number of alleles. Since there are (2) possible gene 
substitutions at a locus with n alleles, we can no longer speak of a single average effect in 
the case of ra > 2, and thus an extension of this kind is plainly necessary. In the second 
edition of The Genetical Theory, we can read that "[w]ith multiple allelomorphism it is 
convenient to define [the average effect of an allele] by the effect of substituting any chosen 
gene for a random selection of the genes homologous with it" (Fisher, il958bi . p. 35). This 
definition can be explicated with respect to a given allele, say Ai, as follows. Immediately 
after fertilization but before the onset of any developmental events, we select the allelic 
type of a gene to be changed into Ai in such a way that the probabilities of selection are 
equal to the allele frequencies. That is, if the vector of allele frequencies is (pi, . . . ,Pn), 
then the gene to be changed is Ai with probability pi, A2 with probability p2, and so on. 
If the gene to be changed happens to be Ai itself, then the Ai — ?■ Ai change will have 
no phenotypic consequence. For all changes other than the null change, the choice of the 
undisturbed gene in the genotype is made in such a way that the population's rules of 
genotype formation are preserved. If genotypes and environments are both dependent and 
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interacting, then the marginal distribution of environmental states must be considered as 
in ( IT5l) . The expected change in the phenotype of the manipulated organism is then ai, 
the average effect of Ai. 

From this definition we can derive some important consequences. Let A^^ stand for 
the number of Ak genes in the population. The total number of genes is X]fe=i ^k = N. 
Among the n experiments defining the individual average effects, choose one to perform 
with a probability equal to its corresponding allele frequency. The expected vector of 
allele frequencies following the randomly chosen experiment is then 



^ N ]^ N 

k=i K e=i 



AT''"' N N^'" N^^ 



(16) 



where e^ is the vector of length n with element unity at position k and zeroes else- 
where. After some algebra we find that the first element of the expected vector is 
Ni{J2^k) /N'^ = Pi, the second is A^2 (Xl ^fc) /N^ = P2, and so on. The expected 
outcome of the randomly chosen experiment is a population with exactly the same allele 
frequencies, rules of genotype formation, and phenotypic mean. We have thus proved that 
the experimental average effects satisfy 

n 

J]pfc«fc = 0. (17) 

fc=i 

With the generalization of the experimental average effect given in the next section, 
017p holds at any one of arbitrarily many multiallelic loci. In th e case of a single locus. 



(fT7|) holds for the regression average effects in ([8]) (Ewens, I2OIII ). and agreement of the 
regression and experimental average effects thus requires the mean subtraction in that 
expression. 

Let us apply the definition of the individual average effect to the biallelic example 
in Table [H There are initially 120 A2 genes in this population of 200 total genes. If 
we perform the experiment defining ai, then with probability .40 the population gene 
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numbers remain at (80, 120) and with probability .60 the numbers become (81, 119). In 
the event of a non-null substitution, with probability 4/7 (given by ^rr^) the change 
is A1A2 -^ AiAi and with probability 3/7 (given by "g ) it is A2A2 — )■ AiA2- The 
expected outcome of the experiment is thus a population with gene numbers (80.6, 119.4) 
and, up to the limits of finite size, the same value of A. Using simple probability calculus, 
we can calculate that the numerical value of ai is —18/35. 

In summary, the experiment defining ai will lead to the null substitution Ai — )> Ai 
with probability pi (in which case the causal effect is zero) and to the substitution A2 
— )■ Ai with probability p2 (in which case the effect is equal in magnitude to the average 
effect of gene substitution with respect to the entire locus). Therefore ai must be equal 
to (pi)(0) + {p2){—a), and from this we can use piai +^2^2 = to derive a = 0:2 — ai 
algebraically. The meaning of this relation among the three average effects is as follows. 
The expected outcome of the experiment defining a2 is a population with gene numbers 
(79.6, 120.4) and nearly the same value of A. Now suppose that we perform the "opposite" 
of the experiment defining ai, on average reducing the number of Ai genes rather than 
increasing them. We compose this experiment with the one defining A2, which in our 
example has a numerical value of 12/35. The population is thus expected to proceed 
through the sequence (80,120) — )> (79.4,120.6) — )> (79,121), preserving A at each step. 
The final state is precisely the one expected upon performing the experiment defining 
a, the average effect of gene substitution for the entire locus. We can see in what sense 
the average effect of gene substitution (6/7) is equal to the effect of removing one gene 
(18/35) and then replacing it with another (12/35). 
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7. Average Effects in the Case of Multiple Loci 

In the case of a single locus with two alleles, we can just as well define the average effect 
of gene substitution as 

„4« (18, 

where n is defined as in ([9]). From this starting point, we can derive the equivalence of 
the regression ([7]) and experimental (fTTl) definitions in the case of genotype-environment 
independence. (fT8|) fills the lacuna in Wright's casual use of the expression 

dW 

dp 



to which Fisher (Il94l[ ) strongly objected. The explicit dependence of /i on A, a measure 
of departure from random combination of genes, meets the criticism that "the numerator 
involves the average of [the phenotype] for a number of different genotypes . . . exceeding 
the number of gene frequencies p on which their frequencies are taken to depend" (p. 57). 
It is interesting that the only genetic condition governing the gene substitutions defin- 
ing the average effect for a single biallelic locus is the constancy of A, a parameter that de- 
pends on the genotype frequencies but not the genotypic means. One might have thought 
that these means, appearing as they do in ([7]), must play some role in the weighting of 
the two possible gene substitutions. It is then natural to ask whether the generalization 
to multiple loci retains the appealing feature that constancy of appropriately quantified 
departures from Hardy- Weinberg and linkage disequilibrium is sufficient — without any ad- 
ditional information regarding the genotypic means — for an experimental average effect 
to agree with its corresponding partial regression coefficient. According to our analysis in 
the Appendix, the multilocus average effects do not in fact retain this feature. That is. 
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we would like to define the multilocus average effect of allele ik at locus fc, Al , as 



o^ = i^^^. (19) 



"■ 2 Opf) 



*fc 



where p is now a vector of allele frequencies at several loci, pi being the element cor- 



(fc) 



iW 



responding to A\ , and A is a vector of whatever measures of departure from random 
combination are preserved under the appropriately balanced gene substitutions. However, 
as will be demonstrated, such a mean-invariant description of the average effects does not 
seem to exist. 

To set up a weaker definition of the multilocus average effects, we require some ad- 
ditional definitions and notational conventions. Suppose that there are L causal loci, in 
the sense of ([3]), affecting the focal phenotype. Suppose also that there are ne alleles AiJ 
(z£ = 1, . . . ,ne) at locus i. We have already stipulated that pij is the frequency of allele 
A\J . Put i = (ii, . . . ,zl) and denote the gamete A\_^ ■ ■ ■ Al^ by the multi-index i. In 
addition, denote the frequency of the ordered multilocus genotype containing gametes i 
and j as Pij. 

Define the coefficient of departure from random combination, 

Pi 

^ij = 7^ (fc) (fc)' ^'^^^ 

as the ratio of the (ordered) whole-genome genotype ij to the products of its constituent 
allele frequencies. The 9ij are thus measures of both Hardy- Weinberg and linkage disequil- 
ibrium; they are all equal to unity if and only if the rules of genotype formation call for 
the rand om co mbination of all gene s. Sp ecial cases of this coefficient were introduced by 



Kimura (119581 ) . although Nagylaki ( 119921 ) has pointed out that some of Kimura's expres- 
sions employing these coefficients are incorrect. To capture how the experimental gene 
substitutions defining the average effects change the departures from random combination, 
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let 

^^^- = ^ - L H^ + ^ (21) 

denote the relative change in 6ij. In the limit of infinitesimal changes, this is equivalent 
to the logarithmic derivative of 6ij. 

Now the experimenter must ascertain the mean of each whole-genome genotype by 
experimental control and then fit the equation 

L 



E[Y I do{ij)] = fj. + ttij + Eij, where aij = ctj + aj, ai = ^ a\J , (22) 



(fc) 

fc=l 



(k) 

to the treatment means thus obtained. The a] are the average effects of the individual 
alleles. The residuals Eij will refiect both dominance and epistasis, and in the g enera l 



case it does not seem profitable to separate the two in the manner that Kimura (119581 ) 
attempted. The fitting is accomphshed by seeking the vector of average effects, a, that 
minimizes the sum of squares 



J]P,,4. (23) 



(k) 

Whereas the minimization defines the Eij uniquely, the alj are so far defined only up 
to a constant term in the sense that one constant may be added to the average effects 
at one locus and the same constant subtracted from th e aver age effects at another locus 
without changing the minimum sum of squares (Ewens, l201ll ). The experimental average 



effect of a given allele, however, is obviously not defined only up to a constant term but 
rather must be equal to the precise number determined by the experiment of replacing a 
random homologous gene with a gene of the given allelic kind. In the Appendix we show 
that performing a non-null substitution in this experiment, in a manner preserving the 
rules of genotype formation, amounts to weighting the possible gene substitutions such 



25 



that the scalar quantity 

7e = Y,P^3^^A3 (24) 

is equal to zero. Another way to phrase this key result is that the vanishing of e ^ is a 

aver age effects to 



necessary and sufficient condition for the regression and experimenta _ 
coincide in the case of genotype-environment independence. Kimura (1l958l ) showed that 
constancy of A suffices for eO to vanish in the case of a single biallelic locus; it is worth 
mentioning that even in this simplest possible case there do not generally exist changes 
in the genotype frequencies such that each individual 6ij vanishes. 

Our theoretical experimenter can of course perform all X]fc=i ^k experiments to deter- 
mine the unique values of the elements in the vector ex. However, given our demonstration 
that the mean of the experimental average effects at any given locus is equal to zero, it 
suffices to impose (TT7|) for each locus as a constraint on the minimization of (!23|) . The 
proof of ( IT71) is still valid for each of multiple loci because the vanishing oi eO along each 
possible branch of the random experiment implies that the expected change in phenotypic 
mean must be equal to 

sE^Nf)"-'' (25) 

and since the expected outcome of the experiment is a population with the same allele 
frequencies, (TT7|) is assured. 

The vanishing oi e9 preserves the population's rules of genotype formation in the 
following sense. Although the number of parameters required to describe departure from 
random combination of genes increases very rapidly with the number of alleles and loci, 
(IMj) implies it is not necessary for each and every such parameter to stay constant. It is 
enough, roughly speaking, for the average change in these parameters to equal zero, e 6 
is similar in form to the weighted average of the relative changes in the departures from 
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random combination, those genotypes with large non-additive residuals being weighted 
more heavily. 

The expression 



„(fc) 



1 / d 



"^^^' = 2 I ^^ E ^^^-^[^ I dom ]_ (26) 

may therefore serve as the definition of the experimental average effect in the case of 
multiple loci. 

Let us recapitulate the meaning of (!26l) . Our variable of interest is the population 
average of the experimentally determined phenotypic means of the genotypes. If genotypes 
and environments are dependent, this variable is not the same as the population mean 
E(y). Partial differentiation with respect to the frequency of allele A\ indicates that we 
examine how our variable of interest responds to the replacement of a small number of 
randomly chosen homologous genes with genes of the given allelic kind. The constraint 
on the partial derivative indicates that we consider only those counterfactual populations 
that can be reached from the original population by experimental replacements that result 
in the vanishing of ( !24|) . The factor of | is owed to diploidy. 

It may seem from the form of the constrained derivative that this definition contains an 
element of circularity, since the etj are defined relative to the average effects in (122]) . Any 
such concern should be dispelled by the fact that ( !26|) fully encodes our argument from 
(122|) to fl25|) , which provides an unambiguous sequence of instructions for the theoretical 
experimenter to follow. The Appendix provides some numerical examples. 

8. Average Effects and Natural Selection 

At this point the reader may be questioning the need for defining the average effect in 
terms of causality, as might be revealed by experimentally controlle d gen e substit utions . 



Modern texts give only the regression definition (Lynch & Walsh, 
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19981 : Biirger, 



20001 ) 



and those who are accustomed to these accounts may resist the new notation and new 
way of thinking. 

We have aheady given one strong motivation to adopt the criterion of sensitivity 
to experimental manipulation: the need to distinguish a causal variant from the non- 
causal markers in LD with it. Another motivation is that dependence of genotypes and 
environments is a frequent occurrence. For instance, a major concern in GWAS is ensuring 
that discovered associations are not attributable to population stratification, which is 
essentially a form of confounding. A well-known apocryphal example is the "chopstick 
gene." A geneticist performing a GWAS of chopstick skill in a large sample containing 
both Europeans and East Asians will undoubtedly find many marker loci failing to satisfy 
the equality 

E(Y I AiAi) = E{Y I A1A2) = E{Y \ A2A2) (27) 

even if, unbeknownst to the geneticist, the corresponding equality ([3]) is obeyed at all loci 
linked to the statistically significant markers. This is because the Europeans and East 
Asians differ both in allele frequencies at these loci and in the prevalence of chopstick use; 
the latter difference presumably has arisen for reasons having nothing to do with genetics. 
A regression of the observed phenotypic values on gene count will nevertheless lead to a 
nonzero "average effect" in violation of both Fisher's verbal definition and common sense. 
GWAS investigators attempt to control confounding by including all other genotyped 
markers in the regression. Since the number of genotyped markers typically exceeds 
the sample size, techniques suc h as p rincipal compone nts an d mixed linear modeling are 



typically employed (Price et al. 



2OI2I ) . The reason for the frequent 



20061 : Zhou & Stephens, 
effectiveness of these techniques is that genomic background become an extremely good 
proxy for the subpopulatio n to which a given sample member belongs as the number of 



loci grows large (Edwards, 



2OO3I ). However, one can construct examples where partialing 
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out other loci fails to deal with confounding (Mathieson & McVean, |2012| ). and in any 
case a theoretical definition whose usefulness depends on contingent quantities such as 
genome size and genetic diversity is inherently unattractive. 

Perhaps the most conspicuous failure of the regression definition occurs in the very 
situation that motivated Fisher to define the average effect. This is when the pheno- 
type is fitness itself. In this case the regression average effect will generically fail to be 
proportional to the partial change in genetic mean per change in allele frequency even 
if the genotypic and environmental causes of fitness variation are additive and initially 
independent. 

A simple simulation will bear out this perhaps surprising claim. The simulated or- 
ganism follows a life cycle consisting of non-overlapping generations. The population size 
is 20,000. Fitness is determined by a single locus and the environment; the frequency of 
A2 is initially 1/2, and the population mated at random in the previous generation. The 
genotypic fitnesses — the values of E[y | do{AiAi)], E[F | do{AiA2)\, IE[1^ | ^0(^2-^2)] — are 
.4, .5, and .6 respectively. We determine the phenotypic fitness of each individual in the 
following way. Immediately after fertilization but before the onset of viability selection, an 
environmental disturbance of .3 in absolute value is added to each individual's genotypic 
fitness. Positive and negative disturbances are equally probable. This scheme ensures 
that genotj^es and environments are independent at this time. 

Whether an individual withstands viability selection to mate with a random fellow 
survivor is determined by a discrete approximation of an exponential process. We stip- 
ulate ten discrete time intervals between fertilization and reproduction, each of which 
an individual survives with a probability chosen so that the probability of surviving all 
ten intervals is equal to the individual's phenotypic fitness. By dividing the time be- 
tween fertilization and mating into more intervals, we could more closely approach a true 
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Table 2: Evolutionary change across time intervals in a simulated organism. 



(3 Ap AA 



fertilization .100 9.33 x lO'^ 1.87 x lO^^ 

time 1 .091 8.07 x lO'^ 1.61 x 10"^ 

time 2 .084 5.86 x lO^^ 1.17 x 10"^ 

time 3 .079 6.11 x lO^^ 1.22 x 10"^ 

time 4 .073 5.08 x lO^^ 1.02 x 10"^ 

time 5 .069 4.72 x lO'^ 9.45 x 10"^ 

time 6 .065 4.10 x lO'^ 8.20 x 10"^ 

time 7 .062 3.47 x lO'^ 6.95 x 10"^ 

time 8 .060 3.00 x lO^^ 5.91 x 10"^ 

time 9 .059 1.28 x lO^^ 2.57 x 10"^ 

time 10 .060 — — 



continuous-time model, where the logarithm of phenotypic fitness would be similar to the 
Malthusian parameter. Ten intervals, however, suffice to make the point at issue. 

Table [2] shows the evolution of this population from fertilization to mating. The first 
column gives the time interval. The second column gives the regression average effect — 
the slope in the regression of phenotypic values on A2 gene count among those individuals 
alive at the beginning of the time interval; /3 is the conventional notation for a regression 
coefficient. The third column gives the change in A2 frequency from the beginning of the 
current time interval to the beginning of the next. The fourth column gives the change in 
the mean genotypic fitness from the beginning of the current time interval to the beginning 
of the next. Because the effect of substituting A2 for Ai does not depend on the allelic 
type of the undisturbed gene, the experimental average effect is of course .10. In this case 
of additive gene action, the genotypic value is the same as the "breeding" or "additive 
genetic" value, which is now often denoted by the symbol A. 

Immediately after fertilization, the regression and experimental average effects coin- 
cide, as expected from the fact that genetic values and environmental disturbances are 
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initially independent. The change in mean genetic value from fertilization to the begin- 
ning of the first time interval is equal to two times the experimental average effect times 
the change in allele frequency. The relation AA = 2aAp in fact holds for each transition 
from one time interval to the next. The relation A A = 2/3 Ap, however, does not hold for 
any transition besides the first. Note the decline in /3, far greater and more systematic 
than can be explained by sampling fluctuations, with the passage of time. 

What explains the increasing discrepancy between a and /3? This is an example of 



20091 ). Suppose that intelligence 



what some methodologists call selection bias {e.g., Pearl, 
and athletic ability are uncorrelated in the population at large. However, if we limit 
our observations to the students attending a university that uses both of these attributes 
as admissions criteria, then we will flnd that intelligence and athleticism are negatively 
correlated. If we learn that a student at this university is academically undistinguished, 
then it becomes more probable that the student is a good athlete. Otherwise the student 
would likely not have been admitted. 

Similarly, if there is some relation between fltness at different points of the lifespan, 
then with the passage of time the genetic and environmental causes of fitness will tend 
to become correlated even if they were initially independent. If we learn that a particu- 
lar survivor of a rigorous selection scheme has an unfit genotype, then it becomes more 
probable that the organism has benefited from a favorable environment. This same prin- 
ciple explains why selection tends to induce deviati ons fro m Hard y- We inberg and linkage 
equilibrium (Bulmer, 



1980 



; Nagylaki, 



20001 : Ewens, 



20041 ): if we find that a 



19921 : Burger, 

survivor has an unfit gene at one genomic position, it becomes more probable that the 
survivor bears fit genes at other positions. As stated previously, the dependence of geno- 
types and environment leads to a divergence between the experimental and regression 
average effects, and the latter then has no straightforward genetic interpretation. 
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It is important to note that our example does not necessarily impugn the validity of 
the FTNS, under the regression definition of the average effect, with respect to organisms 
living in discrete time. This is because in this model the FTNS has come to be interpreted 
as concerning the change in mean breeding value between generations, and the correctness 
of the FTNS is preserved when the mean is measured upon fertilization and the regression 
average effect is measured at the beginning of the parental generation. However, because 
our model places deaths along a temporal dimension between birth and mating, it should 
properly be classified as a continuous-time model. The FTNS is intended to apply at every 
point in continuous time, and therefore our argument for the experimental definition of 
the average effect retains its full force for organisms following such a life cycle. 

Fisher knew that selection bias with respect to the outcome variable prevents regres- 
sion coefficients from being interpretable. In Statistical Methods for Research Workers, 
he pointed out that the applicatio n of a selection process to the outcome variable will 



change the regression line (Fisher, 



1970 



p. 130). It is thus rather curious that Fisher 
never mentioned this principle in connection with natural selection, a form of selection 
bias that is always and everywhere operating. 

The regression definition is made viable by stipulating the use of "true" or "intrinsic" 
phenotypic measurements as the outcome variable rather than the actual measurements. 
This approach, which we adopt in the Appendix, may be natural and inevitable in the 
case of multiple loci. Because of the need to know the residuals in the multilocus case, it 
does not seem possible to banish the concept of least-squares linear regression from the 
theory of average effects. The concepts of regression and causality need to work together. 
Needless to say, the notion of causality remains an essential partner in this collaboration. 
A definition calling for the regression of "true" phenotypic measurements on gene content 
really amounts to replacing the observed phenotypic means of the three genotypes in ([7]) 
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with the experimental means, which requires the same do operator incorporated in (TTT]) 
and ( TTSl) . The instance of do in ( l26l) actually covers two points where we must invoke 
experimental control: once in the determination of the genotypic means, breeding values, 
and non-additive residuals, and again in the replacement of randomly chosen homologous 
genes to resolve the non-uniqueness of the individual average effects. To capture what 
Fisher intended by the average effect in a formal and transparent way, we cannot easily 
avoid a special notation for singling out causal relations from merely correlational ones. 

9. Discussion 



Falconer (119851 ) had the good sense to intuit that sensitivity to physical change was im- 
portant to Fisher's conception of the average effect. Indeed, among all twentieth-century 
scientists, Fisher might have been the one most likely to incorporate the distinction be- 
tween an observed excess and a causal effect into a formal theory. The discrepancy 
that Falconer thought he had uncovered between Fisher's regression and experimental 
definitions of the average effect can be reconciled, in the case of genotype-environment 
independence, by using a specific weighted average of the two possible gene substitutions 
rather than a naive average. If the phenotype is affected by one biallelic locus, the weights 
are chosen so that a population subject to gene substitutions in numbers proportional to 
the weights retains the same value of A = Q'^/{PR), a parameter describing the way in 
which alleles are combined into genotypes. If genotypes and environments interact non- 
additively, then the gene substitutions must also be balanced with respect to the marginal 
distribution of environmental states. This balancing has the desirable property of pre- 
serving the experimentally ascertained phenotypic means of the genotypes. In the case of 
multiple loci, there is no longer a fixed parameterization of genotype formation to which 
the weightings of the gene substitutions must conform, but in a loose sense the changes in 
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the departures from random combination must average out to zero. These restrictions are 
requirements for a change in allele frequency "without change in the environment, or in 
the mating system [rules of genotype combination]." When genotypes and environments 
are dependent — which must always be the case, even if only slightly, as a result of natural 
selection — the ex perimental definition is to be preferred. 



Fisher (1194 ll ) gave one reason why a definition based on experimental gene substitu- 
tions may be inferior to one based on passive observations of a static population (although 
later in this paper he reverted to the language of gene substitutions). He pointed out 
that changes in the frequencies of the different genotypes may feed back to change the 
phenotypic means themselves. He gave the example of experimental gene substitutions 
increasing milk yield, which lead to females in the next generation who can leverage their 
superior nourishment to provide even more milk to their own offspring. Fisher wished 
to discount such knock-on effects — presumably because they are too complex to form 
general rules about them. These knock-on effects can be positive or negative. When 
fitnesses are frequency-dependent, the knock-on effects of naturally selected chai iges i n 



allele frequencies can steadily decrease the mean fitness of the population (Nowak, 



20061) 



The approach of a female-skewed sex ratio to a stable fifty-fifty equili 
nous spe cies ca n be an example of precisely this phenomenon (Fisher, 



jriurn in a polygy- 



1930L pp. 141-143; 



Bennett, Il983l . p. 232). Therefore Fisher consigned changes in the genotype-phenotype 
mapping — the E,[Y \ do{ij)] — brought about by gene substitutions with all other possible 
such changes, including those brought about by unpredictable changes in climate, preda- 
tors, parasites, and so on. Our preferred resolution of the dilemma raised by the cascade 
of additional phenotypic changes that may be initiated by a physical gene substitution 
is to stipulate the constancy of (jlj) and ([5]), for instance, in the experimental definition 
of the average effect. That is, the average effect is calculated on the assumption that 



34 



the prevailing genotype-phenotype mapping will not itself change as a result of the gene 
substitutions. This is equivalent to the stable unit treatment value assumption (SUTVA) 
in the Neyman-Rubin counterfactual framework. 

SUTVA may often have a reasonable interpretation. For example, in the cases of 
fecundity selection and frequency-dependent fitnesses of game-theoretic strategies, we may 
interpret each causal effect as the expected phenotypic change upon placing a manipulated 
organism in a virtual environment containing the same mixture of types constituting 
the undisturbed population. In any event finding an interpretation of SUTVA may not 
be important in most biological situations, so long as any frequency- dependent changes 
ensuing from the experimental manipulation of a few individuals can be neglected in a 
theoretically infinite population. 

It is the constancy of the E,[Y \ do{ij)] rather than the constancy of the corresponding 
observed phenotypic means that is satisfied by the gene substitutions defining the average 
effect in the case of genotype-environment dependence and interaction. This striking fact 
further affirms the priority of causal quantities over observables that may have no causal 
interpretation. 

A renewed understanding of the average effect is especially timely given the enable- 
ment of GWAS by modern technology and t he up surge of research into the inheritance of 



fitness in hur nan p opulations (Stearns et al. 



2OIOI). The findings of the ENCODE Project 



Consortium (120 12[ ) indicate that the fine-mapping of the variants with nonzero experi- 
mental average effects responsible for a given association signal may turn out to be less 
onerous than was once supposed. However, care is needed as researchers isolate vari- 
ants with ever smaller average effects, which will be difficult to distinguish from spurious 
signals generated by subtle confounding or selection bias. 

An appealing feature of GWAS is the availability of a complementary study design. 
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pioneered by Spielinan et al. ( 1l993l ). that offers nearly the entirety of the benefits inhering 
in experimental control. According to Mendel's laws, a parent passes on a randomly chosen 
gene from each of its homologous pairs to a given offspring. Given the applicability of 
Mendel's laws, we can then treat the genotype of an offspring given the parental genotypes 
much like a treatment in a randomized experiment. It follows that a significant association 
between transmission of a particular allele and the focal phenotype cannot be the result 
of confounding; in the absence of selection bias, the only feasible explanation is linkage 
with a locus where the average effect is nonzero. Fisher himself noted this feature of 
family-based studies: 

Genetics is indeed in a peculiarly favoured condition in that Providence has 
shielded the geneticist from many of the difficulties of a reliably controlled 
comparison. The different genotypes possible from the same mating have 
been beautifully randomized by the meiotic process. A more perfect control 
of conditions is scarcely p ossible , than that of different genotypes appearing 
in the same litter. (Fisher, Il952l . p. 7) 



Family-based studies have successfully been used to re plicate finding s from studies of 



2010 



Turchin et al.. 



20121), and this is 



nominally unrelated individuals (Lango Allen et al.., 

another way in which the thought experiments defining the average effect are becoming less 

like Gedanken and more like routine empirical operations. We note that when Spielman 



et al. (I1993I ) introduced their family-based test, their null hypothesis was no linkage with 
a causal locus despite the presence of population association. This test and its variants 
have since often been used to test the null hypothesis that there is neither linkage nor 
association. We anticipate that there will be a trend back toward the original form of 
the test. Because parent-offspring trios and sets of siblings can be difficult to recruit and 
require more genotyping, investigators find it convenient to test for population association 
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in large samples of unrelated individuals. Those markers showing evidence of association 
can then be interrogated, however, for linkage with loci where there are nonzero average 
effects. The follow-up cohorts of families will typically be much smaller and less likely to 
yield genome-wide significant p-values, but it will be reasonable to require less stringent 
evidence or merely overall sign agreement greatly exceeding 50 percent. This procedure 
can provide a check on whether the association stage is producing an acceptably low 
rate of false positives with respect to the causal hypothesis of a nonzero average effect — 
which, of course, is not strictly the same as the statistical hypothesis of a nonzero partial 
regression coefficient. 

We note that family-based studies are not immune to selection bias intervening be- 
tween fertilization and the time of measurement, which may rise to an appreciable level 
in studies of phenotypes strongly affecting fitness. This may be a challenge for gene-trait 
mapping studies conducted in the near future. 

It may be tempting to define the average effect in terms of a hypothetical family-based 
study. However, whereas rejecting the null hypothesis of a zero average effect requires 
only the assumptions of Mendel's laws, effect estimation requires additional assumptions 
and t hus d oes not seem particularly suited for a theoretical definition after all (Ewens 

et fl/.- bonsI ). 

Finally, we comment on the role of the average effect in the FTNS. We write the 
breeding (additive genetic) value of a given individual as 

L ni 



^ = EE^K')<'' PS 



=1 ii=l 

where x(') is a function giving the number of A] genes (0, 1, or 2) present in the 
individual's genotype. The variance in breeding values, Var(A), is now called the additive 
genetic variance, and the ratio Var(y4)/Var(F) the heritability in the narrow sense. It is 
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important to keep in mind that these breeding values are hnear functions of experimental 
average effects; we are building up a predicted value for a given individual from the causal 
effects of the genes present in the genotype. 

The FTNS states that the partial change in mean fitness attributable to changes 
in allele frequencies caused by natural selection is proportional to the additive genetic 
variance in fitness, which can be shown to equal 

SEE!-?"™"!:'. (29) 

1=1 ii=i 

where the meaning of a]^ is as follows. If genotypes and environments are independent, 
then this quantity is the average excess of ^^^ , which is usually defined as the difference in 
mean fitness between the bearers of the given allele and the entire population. ( I29l) is in- 
variably derived under the assumption that genotypes and environments are independent. 
Because under our definitions the values of the experimental average effects do not depend 
on the extent of genotype-environment dependence, it follows that the breeding values 
and hence the additive genetic variance are also insensitive to genotype-environment de- 
pendence. The equality of (!29|) with Var(A) is thus fully valid in our account — given the 
following modification regarding alj . 

If genotypes and environments are not indepen dent, a] in (1291) is not exactly the same 



as the average excess defined by Fisher (jl958bl . p. 35). It is rather the average excess 



that would be observed if genotypes were distributed randomly among environments. 
In other words each al only reflects confounding with other genetic loci and not with 
environmental causes. To repeat, this is a consequence of the fact that our experimental 
average effects — and hence all quantities derived from them, including the additive genetic 
variance — are sensitive only to the marginal distribution of environmental states. Every 
factor in ( 1291) . including the a]^ , must therefore be equal to whatever they would be under 
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genotype-environment independence, the standard setting in which ( 129|) is calculated. If 
the "full" average excesses were substituted into ( 129|) , then the expression would no longer 
be interpretable as a variance; it could then possibly be negative. 

It is well known that the change in the frequency of A\ is proportional to the product 
of p\ and the actual difference in mea n fitn ess between the bearers of the given allele 



1970l ). From the fact that the difference is not 



and the entire population {e.g., Price, 
necessarily equal to our a\ , we learn that there is partition of the total change in allele 
frequency between the change caused by natural selection and the change attributable 
to how genotypes are distributed across environments varying in severity. This partition 
is in the same spirit of Fisher's conditions discussed previously. Like changes in the 
rules of genotype formation or the E[y | (io(ij)], deviations from genotype-environment 
independence cannot generally lead to an increase in fitness, and indeed the example set 
out in Table [2] demonstrates that the dependence induced by natural selection itself tends 
to retard the frequency increase of the superior allele. 

Each increment of naturally selected change in allele frequency is a direct cause of a 
change in the mean fitness equal to 2^^^. Any discrepancy between the total change and 
this partial change, summed over all loci and alleles, is owed to indirect effects acting 
through changes in the rules of genotype formation, the distribution of environmental 
states, or some other determinant of fitness. This completes the FTNS: the increase 
in the mean fitness of a population caused exclusively by the effect of natural selection 
on allele frequencies — setting aside those changes in fitness (which can be positive or 
negative) ascribable to other causes — is equal to the additive genetic variance in fitness. 

Fisher's contributions to biology and applied mathematics were of course numerous 
and profound. Judging from his writing in The Genetical Theory, however, we surmise 
that he considered the FTNS to be the most important of his achievements. The FTNS 



39 



quantifies Darwin's notion of liereditary variation in fitness leading to adaptation and 
provides a deeper understanding of it. It is interesting tliat ( l29l) . Fislier's "supreme law of 
the biological sciences," explicitly encodes a distinction between an observed excess and a 
causal eff ect, t 



;lie same distinction that animated his work on experimental design, which 
Neyman ( 119671 ) praised as the greatest of Fisher's contributions to statistics. The FTNS 
was thus another blow struck by Fisher against his scientific adversary Karl Pearson, 
who believed it was possible both to study evolution mathematically and to discard the 
notion of causality. If causality appears inevitably in the formulation of a phenomenon 



as fundamental as evolution by natural selection, then it surely caimot 
"fetish amidst the inscrutable arcana of modern science" (Pearson, Il911 



3e a dispensable 
p. xii). 
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Appendix 

Here we explicitly derive the conditions under which the regression and experimental 
definitions of average effect are equivalent. We assume that the equivalence can always be 
secured in a meaningful way, either because genotypes and environments are independent 
or because the regression has been performed on the experimental genotypic means rather 
than the observed genotypic means. We will often refer to an experimental average effect in 
the sense of an arbitrary linear combination of relevant causal effects (differences between 
genotypic means) and narrow down out reference to particular linear combinations as the 
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given argument proceeds. We first treat the case of a single biallelic locus, which is of 
special interest because it is possible here to find explicit expressions for the weights ci 
and C2 in flTTl) . 



Let i stand for E[Y \ do{AiAi)], j 



notation is similar to that of Fisher ( 11918 



'or E 



Y\do 



{AiA2)l and k for E[Y \ do(AA)]- This 



19411 ) ■ By using the do symbol, however, our 



argument below is meaningful even if genotypes and environments are dependent and 
non-additive. 

To minimize the sum of squares 

P{i-v + af + 2Q(j - uf + R[k-u- af, 

we take partial derivatives with respect to v and a and set them equal to zero. Solving 
the two resulting equations gives 

P{Q + R)U-i) + R{P + Q){k-3) 



a 



(Al) 



PQ + QR + 2PR 

which can easily be recognized as equivalent to ( ITTl) in the case that genotypes and 
environments act additively. Using (fT4l) to expand each experimental mean, we find that 
the numerator of ( 1A1[) becomes 



Cl 



J2 Pr{£i)E{Y I A1A2, £i) - Pr{Si)E{Y \ AiAi,£i 

J2 P^i^iMY I ■^2A2, £^) - PT{£^)E{Y I A1A2, £,: 



+ C2 



(A2) 



which means that (JAip is also equivalent to (TTSll . 

Now consider the change in the mean phenotype caused by experimental gene substi- 
tutions. The contribution to the population mean phenotype by the experimental means 
of the genotypes is given by 

^ = iP + 2jQ + kR, (A3) 
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and the change in the population mean upon effecting the gene substitutions is 

d/i = idP + 2jdQ + kdR. (A4) 

The changes dP, dQ, dR have two degrees of freedom. To express the changes in terms 
of a single change dp, we must obtain anoth er con dition, which can be expressed without 



loss of generahty as /(P, Q, R) = 0. Fisher (1194 ll ) gave the condition that A = Q'^ /{PR) 



remains constant, but his concise argument has puzzled many commentators. 

It turns out that Fisher se t du = idP + 2jdQ + kdR equal to 2adp and equated the 



coefficients of i,j, k (Edwards, 



19671 1. which yields 

dP = -2P{Q + R)dp/S, 

dQ = Q{P - R)dp/S, 

dR = 2R{P + Q)dp/S, (A5) 

where S = P{Q + R) + R{P + Q). The function / satisfies the differential equation 

df df df 

Inserting (1A5|) into (]A6|) gives 

- 2P(Q + R)^ + Q{P-R)^ + 2R{P + Q)^ = 0. (A7) 

Now (-2P(Q + R), Q{P - P), 2P(P + Q)) and (|^, §^, |J) can be regarded as two or- 
thogonal vectors in three-space. We want the second condition to be independent of the 
conservation of probability condition and not to be the trivial zero vector. By inspection, 
we see that a solution is given by 



df 





dP 

df 


P' 

-20 


dQ 
df 
dR 


Q 

"P' 



(A8) 



42 



where is an arbitrary function of P, Q, R. A simple solution is given by setting equal 
to the constant a, whereupon (JA8I) can be integrated to obtain 

/ = -2alng + alnP + alni? + alnA, (A9) 

which gives the condition Q^"^ = (APi?)'^. a = 1 gives the condition expressed in terms 
of the classic Fisher parameter. Conversely, if we let = PRQ^^ then we get / = 
PRQ^^ — (1/A), which also gives the Fisher parameter. 

Taking the partial second derivatives gives the compatibility conditions that must 
satisfy: 

1 (90 _ 1 90 

PdR ~ RdP' 

1 90 _ -2 (90 

pdQ~ g"ap' 

RdQ Q OR' ^ ^ 

Hence, any differentiable function of PRQ~^ is a solution. This then implies that / can 
be any differentiable function of PRQ^^ as well. This shows that the average phenotypic 
increment caused by a number of experimental gene substitutions is the same as the slope 
in the regression of the phenotype on the experimental genotypic means if the substitutions 
are performed in a background where any function of PRQ~^ is held constant, with A 
being the simplest one. 

We now treat a phenotype affected by an arbitrary number of multiallelic loci. As 
shown in Section 7, the experimentally determined phenotypic means of the whole-genome 
genotypes can be expressed as 

E[y I do{ii)] = /i + Uij + Sij. 

In the remainder we abbreviate E[y | do{ii)] as Gij and set aij = Gtj — fi, which obeys 
the condition J2i j PijO-ij = 0. 

43 



The average effects can be written as aij = a-i + aj = J^ii'^ie '^'^jj) ^^"^ ^^^ obtained 
by minimizing 

^P,j(ay-ay)2. (All) 

The minimum obeys the condition 



pr«r = E E «."-. = E E «. E (".'? + "5?) . (A12) 

where 

p!f«Sf = EE«^.^. (A13) 

defines the average excesses. A sum running over i \ ik should be understood as a sum 
over all multi- indices i where the fcth element is fixed to ik- These relations imply that 
J2ij PijOiij = 0, which also implies that ^ • . PijSij = 0. 
Equation (lA12p can be rewritten as 






(k) (fc) f (k) , (kk)\ (k) . V^V^ (kl) {£) . \-^ \-^ {kt) 

-E^i'M?. (A") 

where 

i\ik ,il 3 

denotes the frequency of gametes that carry AlJ and A\J and 

^tkji Z^ Z^ -^»-j 

denotes the frequency of all multilocus genotypes that carry A\ and Aj on different 



chromosomes. The matrix H in flA14p is constructed as follows. Let p denote the vector 



of allele frequencies, a the vector of average excesses, and ex the vector of average effects. 
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These vectors have length ^^ n^, and their elements are ordered by locus. We can then 
define 

H = D + P + Q, (A15) 

where D is the diagonal matrix with the components of p on the diagonal, P is the 



matrix with e ntries pi ^ ii k ^ (. an d o therwise, and Q is the matrix with entries q. 



(Ewens, 



1992t Castilloux & Lessard, 



ikjl 



19951 ). We will use the notation p ■ a to designate the 
component-wise product of the vectors p and a, i.e., (p ■ a)j = pjaj. (IA14I) can thus be 
rewritten again as 

Q; = H-^(p-a) (A16) 

subject to suitable constraints on ct. We will shortly see that these constraints turn out 
to be (1171) for each locus. Given our ordering convention, the element H^ ■ lies in the row 



m 



of H corresponding to allele A\ and the column corresponding to A\ . 



(i) 



The total change in /i is 



ik i,j Pik ik i-i Pi 



«fc «J 



If, 



EEK+^^.)|%^^Jf (A17) 



«fe *j 



dp] 



«fc 



upon performing a number of experimental gene substitutions at locus k. Agreement of 
the experimental and regression average effects implies that this change must equal the 
change predictable from the breeding values. 



which implies in turn that 



ik i,j Pik 



«J «*: ^tk 



(A18) 



(A19) 
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is a necessary and sufficient condition for the experimental and regression average effects 
to coincide. The bald statement that the changes in genotype frequencies must somehow 
nullify the non-additive residuals, however, is not very revealing. We can render (IA19|) 
into a more insightful form by noting that 










(A20) 



because the sum over ^ is a constant determined by the experimenter. Using this, from 
flAlQI) we obtain 



E^. 



1 dPj, 



^ ' Pii Pk 



H 



0, 



(A21) 



which leads to flMj) . This argument, which simplifies one given by Lessard ( 119971 ). can be 
used to construct a variety of quantities measuring departures from random combination. 
The 9ij appear to be the simplest such quantities. 

The criterion (lA2ip does not pick out a unique weighting of the possible gene sub- 
stitutions for a given genetic architecture. It would be of great significance if a subset 
of the possible weights could be characterized in a manner that does not depend on the 
non-additive residuals. We have done this for a single biallelic locus, where the subset 
contains the singleton weighting of the two possible gene substitutions that conserves A. If 
a general procedure for constructing such a residual-free characterization for any number 
of loci exists, then the following argument should be able to find it. 



The contribution of the experimental genotypic means to the population mean is 



/i / J ^ij^ij- 



(A22) 



«j 



The definition of the experimental average effect can be written as 

(fc) _ 1 <9/x 



a 



«fc 



2 5pSr 



(A23) 



«fc 
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Imposing constancy of the experimental means, we can write the change in the population 
mean due to a change in frequency of allele A] as 

-^^ = y G,,^ = y (G„- -^)^ = y a,,^, (A24) 

using the fact that ^- • —jkjPij = 0. The indeterminacy in the partial derivatives with 
respect to allele frequency will be resolved by the properties of A in (TT^ that emerge from 
the subsequent analysis. 

Substituting (IA24p and (IA23D into (IA14p using (IA13P gives the condition 



for each i^ and k. Closed-form solutions of these partial differential equations will not 
exist in general. However, using symmetry conditions and properties of H, we may infer 
some necessary conditions on the genotype frequencies that must be satisfied. 

We first note that the image space of H contains all permissible vectors of allele- 



frequency changes (Lessard & Castilloux, Il995l ). Since H is invertible on its image space 



we may operate on ( ]A25P by the inverse of H (which we call J) and thereby separate the 
PDE system into a set of Yl "^i ordinary differential equations, which we denote by 

^Y.J^.J^^ = Y.--^- (A26) 

ik m,n '^Pjk 

We may now select any row of ( JA26I) . expand the PlJaiJ in terms of the Omn, and equate 
the LHS and RHS coefficients of amn- This will result in a set of J^n^ x J^n^ ordinary 
differential equations of the form 

1 dP 

^ '-'^ van 



Iran rlrV-. ' 



(pmnUk), (A27) 



where (pmnijk) is some linear combination of the elements of the vector Jj^=j^,i^- From 
this point the a^^ = a^n + ^mn no longer appear in the argument, and it follows that we 
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must be finding properties of a solution tliat depends on neitlier tlie breeding values nor 
the non-additive residuals. 

Conserved quantities imposed by flA25|) . which can be used to form elements of A, can 
be constructed by taking linear combinations of the ODEs such that 

1 dPrr,., 



E 



(Tr, 



P 



dp 



{k) 
jk 



0, 



(A28) 



from which we obtain conserved measures of departure from random combination assum- 
ing the form 



ll{o->0}-^"/3 

ll{f7<0} ^J^ 



Act, 



(A29) 



where amn is some set of coefficients that are positive, zero, or negative. These conserved 
quantities will form a set of necessary conditions for the equivalence of the experimental 
and regression definitions of the average effects. 

Note that the coefficients of a^nn on the LHS of flA26p are grouped according to the 



Ak) 



Thus all of the amn expressed in a given a\ will have the same coefficient (one of 



the elements of J). We can thus construct conserved measures of Hardy- Weinberg and 
linkage disequilibrium without an explicit calculation of J because we know which sets of 
coefficients are equal. 

Our first numerical example is of a single locus with three alleles (Table lAll) . The 
case o f a single locus with any number of alleles was analytically treated by Kempthorne 



(I1957I ). The equating of coefficients along the ith row of (IA26P leads to the matrix of 
equations 

•Jil — T^ 7\ •Jil — 7^ 7: <^i2 



/ 



J, 



i2 



J, 



J3 



1 


dPii 


Pll 


dpi 


1 


dP22 


-P22 


dpi 


1 


dP33 



Jil 



P33 dpi 



J, 



12 



1 


dPi2 


P12 


dpi 


1 


dPi3 


Pl3 


dpi 


1 


dP23 



1 dP2i\ 



J, 



i3 



P23 dpi 
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J, 



J3 



P21 


dpi 


1 


dP3i 


P31 


dpi 


1 


dP32 



(A30) 



P32 dp, f 



for allele i. The notation P^- now means the ordered genotype with alleles i and j. This 
matrix gives a set of nine conditions plus conservation of probability that must be satisfied 
to ensure the equality of flA25|l . However, given that there are only six unique genotypes, 
these conditions are overdetermined and will not necessarily be solvable. We can attempt 
to formulate a solvable set by combining these conditions. We can see that the second 
and third elements in a given row of this matrix must equal the sum of the elements in 
the first column corresponding to the homozygous bearers of the relevant alleles. For 
example, 

1 dP^2 , 1 dP21 1 dPu^ 1 dP22 , ^ , .,„^. 



Pi2 dpi P21 dpi Pii dpi P22 dpi 
and these equations lead collectively to the three conserved measures of Hardy- Weinberg 

disequilibrium 

p2 p2 p2 

^12 = -^— ^' ^13 = "^— ^' ^23 = -^—^- (A32) 

-'11-' 22 -'11-' 33 -'22-' 33 

Two of the allele frequencies and these three conserved quantities appear to be a complete 
specification of the six genotype frequencies. By the implicit function theorem, invert- 
ibility of the Jacobian at any solution (pi, ^2, A12, A13, A23) specifying a valid vector of 
genotype frequencies ensures that there are unique solutions for small perturbations of 
the allele frequencies. Numerical testing suggests that invertibility of the Jacobian is a 
generic property of this five- dimensional system. 

Given the numerical values in Table lAll what is the experimental average effect of 
substituting A2 for Ai! There are three ways in which this gene substitution can be 
brought about: AiAi -^ A1A2, A1A2 — ^ A2A2, and ^1^3 — ?■ ^2-4.3. The causal effects 
of these three substitutions are 1, 2, and —1 respectively. 

We first attempt to satisfy the weaker criterion that flMl) is equal to zero by determining 
which weighted average of the first two substitutions yields the smallest absolute value of 
e6. To calculate a discrete approximation of the 6ij, we use a population size of 10,000. 
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Table Al: A trait affected by a single triallelic locus. 



genotype 


E[F do{-)] 


frequency 


e 


AiAi 


10 


.2 


-.3402778 


A2A2 


13 


.2 


.2152778 


A,A, 


12 


.2 


-.6875 


A1A2 


11 


.2 


-.5625 


A^3 


14 


.1 


2.4861111 


A2A, 


13 


.1 


.2638889 



We examine all integer weights such that the weights sum to 90. There are 91 such 
weighted averages, and it turns out that the weights (70, 20) yield the minimum. In fact, 
the absolute value oi eO yielded by these weights is roughly 1.5 x 10~^^, which is nearly 
within machine error of zero. The 90 other weighted averages lead to absolute values of 
e9 exceeding 1 x 10~^. 

These weights lead to an experimental average effect, 02 — ai, equaling 11/9. In the 
case of a single locus, the regression average effects (which we now denote by /3) do not 
require the imposition of (IT7|) to be identified, and the calculations yielding the values of 
the Sij in Table lATI also give us (—0.7798611, 0.4423611, 0.39375) as the numerical value 
of (/3i, /32, /Ss). It appears that [32 — Pi is exactly equal to 11/9. 

We can use a different pair of substitutions, say A1A2 — > A2A2 and A1A3 — )■ ^2-4.3, 
to yield the experimental average effect 0:2-01. We examine all integer weightings of 
these two substitutions such that the weights sum to 270. It turns out that the weighting 
(200, 70) yields the minimum. The absolute value oi e 9 yielded by these weights is 
roughly 4 x 10^^®, again nearly within machine error of zero, whereas the 270 other 
weighted averages all lead to absolute values oi e6 exceeding 3 x 10~^. These minimizing 
weights again lead to an experimental average effect of 11/9. It is rather interesting that 
the neighboring weights (199,71) and (201,69) lead to such higher values oi e9 despite 
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the numerical closeness of these weighted averages and the fineness of our discretization. 
In fact, we have chosen to present this example because of this phenomenon, which we 
conjecture to be related to the fact that the 02 — ai happens to be rational and thus 
exactly equal to some integer-weighted average of the causal effects. 

Evidently it should not be possible to obtain a valid average effect by using only the 
substitutions AiAi -^ A1A2 and ^1^3 — )■ ^2-4.3. Examining all integer weights summing 
to 1000, we find that e 9 declines linearly from (0, 1000) to (1000, 0); the absolute minimum 
of £ ^ is thus attained at a boundary, and it is not especially small (~ 2 x 10~^). 

We examine whether our conception of individual average effects is valid. Using the 
method of minimizing eO, we find that 02 — 0:3 is approximately .049. According to our 
notion of substituting A2 for a random homologous gene, 02 must be equal to Pi(a2 — 
«i) + pz{.(^2 — Oi^)- In our example (pi, p2, Ps) happens to be (.35, .35, .30), which leads 
to .4425 as the approximate numerical value of 02- This is in good agreement with (52- 
Continuing this exercise, we can satisfy ourselves that (ai, 0^2, 0:3) and (/3i, /32, (3^) are 
equal. 

We now attempt to satisfy the stronger criterion that the quantities in ( JA32I) remain 
constant. The numerical value of (pi, ^2, A12, A13, A23) is (35/100, 35/100, 1/4, 1/16, 
1/16), and a perturbation of (—1/1000, 1/1000, 0, 0, 0) leads to a numerical solution that 
specifies another valid vector of genotype frequencies. The weighting of the possible gene 
substitutions satisfying the changes in genotype frequencies is typically not unique. In a 
population of size 10^, one permissible vector of weights for our example can be reasonably 
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well approximated by 

88,821 88,951 
AiAi > A1A2 > A2A2 

6 

6 22,222 

^3^3 < AiAs > A2AS 

(A33) 

where the label of each arrow indicates how many gene substitutions of that kind are to 
be performed. Notice that there are 12 gene substitutions involving a genotype containing 
the allele ^3. For each ^3 gene created by ^1^3 — ?■ ^3^3, another ^3 is destroyed by 
A2A3 —7- A2A2, and the net result is the same frequency of ^3. These 12 substitutions 
turn out to be a way of decreasing the number of Ai genes and increasing the number 
of A2 without directly converting one to the other. We might as well pair each ^1^3 — > 
^3^3 with A2A3 —7- A2A2, treating each such pair as a single substitution. The weighted 
average of the gene substitutions is then 

88, 821(1) + 88, 951(2) + 22, 222(-l) + 6(-2 + 0) 
88,821 + 88,951 + 22,222 + 6 ' 

which diverges from 11/9 at the fourth decimal place. 

We now apply our argument to the case of two biallelic loci. Here we will encounter 
a contradiction. 

The equating of coefficients along the row of flA26l) corresponding to allele Ai now 
leads to the matrix of equations 

/ T I T _ 1 0-Pii,ii 7 I 7 _ 1 9fi2.ii 7 I 7 _ 1 '9-P2i,ii 7 I 7 — 1 8f22,ll \ 

/ -^'^ ^ -^'^ ~ Pii.ii apf-' '^ ^ ^^ ~ A2,ii ap('=) ^2 + -Jts - P2^_^^ gp(fc) 'Ji2 + -JiA - P22_^^ g^(fc) \ 

7 _l_ 7 — 1 9fll,12 T I T _ 1 9fl2,12 T I 7 _ 1 i9f21,12 7 _i_ 7 _ 1 9^22,12 

-Jii + -'i3 - p^^^2 aptfc) "^^1 ^ "^^4 - p^2_j2 dpO'i "^^2 -h Ji'i - P2^^2 (ft) 'Ji2 -h -Ju - P22^^2 apC--) 

k k k k 

T I T _ 1 Qfll,21 7 I 7 _ 1 9Pl2,21 7 _|_ 7 _ 1 9^21,21 7 i 7 _ 1 9^22,21 

-Jil + Ji3 - p^^ 21 dp\^^ '^ '^ ~ ■Pl2,2l ap('=) "^^2 + -JiZ - P2^_2j ^p{fc) •Ji2 + -Jii - p22_2i gp(fc) 

T I T 1 9fll,22 7 I 7 1 t?Pl2,22 7 I 7 1 i9f21,22 7 I 7 1 8^22,22 

yH + -'i3 - pj^ 22 Q^ik) Ji\ + -'i4 - p^2,22 ap('=) "^^2 + Jt'i - p2^ 22 ^^(fc) -'i2 + -'i4 " P22 22 g^Cfc) j 



»fc ' »fc " »fc 



(A34) 
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plus conservation of probability that must be satisfied to ensure the equality of (]A25|) . 
An argument analogous to the one below ( 1A30I) shows that six quantities of the form 



Ajj 



P.2. 

'J 

p.. p.. 



(A35) 



must be conserved. If we do not assume that the double heterozygotes are phenotypically 
equivalent, then these six measures of Hardy- Weinberg disequilibrium, the allele frequen- 
cies at the two loci, and conservation of probability leave one more condition to specify 
ten genotype frequencies. 

Rearrange each element of (1A34I) to put the genotype frequency on one side and form 
the four column sums. Each such sum is the marginal frequency of a gamete. For example, 
we have 



dPu, 



{k) 



dPi 



(A36) 



which implies that 

Combining all columns, we get 

dPn 

which yields the condition that 



Jji + J, 



J3 



1 ^1 



+ 



dP22 dp 



12 



dp 



21 



^Ph 



^Pfu 



dp. 



(k) 



c 



PuP. 



22 



(A37) 



(A38) 



(A39) 



-P12-P21 
remains constant. ( is the measure introduced by Kimura (jl965l ). and the multi-index 

notation immediately reveals that it is equal to unity in linkage equilibrium. 

The equality of the regression and experimental average effect s for constant A = 

(All, 21 5 • • • 5 Ai2,225 C) appears to conflict with the result of Nagylaki (119761 ) that the stip- 



ulation of A( = and random mating to reset the Ajj to unities among zygotes does 
not lead to the change in the mean phenotype equaling the summed products of average 
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Table A2: A trait affected by two biallelic loci. 



genotype 


E[F rfo(-); 


frequency 


e 


A'^U'^^IA'^UT 


17 


.054 


5.0100265 


A?Af^/A^^Af^ 


12 


.036 


-1.438691 




13 


.257 


-.8874187 




14 


.140 


-.3345667 


.(1) .(2)/ .(1) .(2) 


18 


.080 


-.7832893 


A?Af^/A?Af 


10 


.039 


-4.7832893 


A?A?/A^^Af^ 


16 


.066 


4.7679882 


^(1)^(2)/ .(1). (2) 


15 


.041 


-.6791599 


^(1)^(2)/ .(1). (2) 


11 


.029 


-3.2178824 


^(1)^(2)/ .(1). (2) 


20 


.258 


.4233950 



effects and changes in allele frequencies (in the case that the phenotype is fitness). Our 
next numerical example shows that we have indeed reached a contradiction (Table |A2|) . 

Numerical testing suggests that invertibility of the Jacobian is also a generic property 
of the nine- dimensional system {p^^\ p^'^\ Aii^2i5 • • • , ^12,22, 0- We numerically update the 
vector of genotype frequencies in Table IA2I by increasing the frequency of allele ^2 by 
10^^. The regression average effect at locus 1, as determined by the Levenberg-Marquardt 
algorithm, is approximately 2.4934. However, when we multiply this by two times 10~^, 
the result does not closely agree with GijAPij. The discrepancy is close to 12 percent 
and does not diminish as Ap^^^ is made smaller. We conclude that we have falsified our 
initial assumption that a residual-free description of the average effects always exists. 

Sampling vectors of initial genotype frequencies from the Dirichlet distribution, we 
find that the changes implied by constancy of A in the case of two biallelic loci do not 
typically produce such a large discrepancy. The error is usually less than 7 percent. This 
suggests to us that there may exist a subset of weights, distinguished by the changes in 
the departures from random combination all being "small" in some sense, that can be 
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mathematically described. We leave this issue to future research. 

The vanishing of e ^ is still an applicable criterion. For example, the genotype A\ A2 / 
Ai Ai can be transformed into either double heterozygote, depending on whether the 
left or right gene at locus 1 is the target of the substitution. In one case the causal effect 
is 6, and in the other it is —2. Among all integer weightings of these two substitutions 
summing to 1000, the weights (562, 438) yield the minimum. The corresponding weighted 
average of the causal effects, ctg — «! , equals 2.496 and is also the closest to /32 ^/^i ~ 
2.493 that can be obtained given our discretization. The replacement of randomly chosen 
homologous genes can now be used to determine {al , ttg ). 
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